Search Result

Select

Context feature extraction method of terrorism behavior based on dependence maximization

XUE Anrong, JIA Xiaoyan, GE Qinglong, YANG Xiaoqin

Journal of Computer Applications 2015, 35 (3): 797-801. DOI: 10.11772/j.issn.1001-9081.2015.03.797

Abstract （459）

PDF （835KB）（409）

Save

To combat the missing value problem in terrorism behavior data set, this paper proposed Compressed Context Space (CCS) method which is based on the idea of maximizing the dependence between the context vectors and actions. CCS relied on Hilbert-Schmidt independence criterion which evaluated the relationship between two variables according to their Hilbert-Schmidt norm. Theories have proven Hilbert-Schmidt norm can detect dependence. In order to detect the relevance well and maximum the dependence between the context features and actions, CCS should maximum Hilbert-Schmidt norm between the linearly mapped low-dimensional features and actions, which is able to reduce the effect of missing value problem. Combining CCS followed SVM (CCS) can produce effective classification. Experiments on MAROB show that the proposed CCS+SVM improves SVM, PCA+SVM, CCA+SVM and CONVEX by at least 1.5% and 1.0% for recall and F measure, and has competitive performance with the best results for precision and Area Under ROC Curve (AUC). The results show that CCS+SVM handles missing value problem well.

Reference | Related Articles | Metrics

Select

Privacy preserving clustering algorithm based on wavelet transform for distributed data

XUE Anrong LIU Bin WEN Dandan

Journal of Computer Applications 2014, 34 (4): 1029-1033. DOI: 10.11772/j.issn.1001-9081.2014.04.1029

Abstract （428）

PDF （783KB）（405）

Save

The existing privacy preserving clustering data mining algorithms cannot meet better trade-off between efficiency and privacy. To resolve this problem, a distributed privacy preserving clustering algorithm based on Secure Multi-party Computation (SMC) combined with perturbation was proposed. Data owners utilized the wavelet to achieve both data reduction and information hiding, and rearranged the attribute columns randomly to prevent data reconstruction which has potential danger of causing information disclosure. The proposed algorithm reduced computation and communication cost because it only used reduced data in its computation. Thus the efficiency of the algorithm was improved. At the same time, the incorporation of multiple protection measures in the computation effectively preserved data privacy. The clustering accuracy was less affected because of the high dependability of wavelet transform. The theoretical analysis and experimental results indicate that the proposed algorithm is secure and highly effective, and the overall F-measure and the efficiency of the proposed algorithm outperform the DCT-H (Discrete Cosine Transform-Haar) algorithm when dealing with high-dimensional datasets. Above all, it effectively resolves the trade-off issue between efficiency and privacy.

Reference | Related Articles | Metrics

Select

Speeding up outlier detection in large-scale datasets

XUE Anrong WEN Dandan LIU Bin

Journal of Computer Applications 2013, 33 (11): 3057-3061.

Abstract （765）

PDF （779KB）（327）

Save

The existing distance-based outlier detection algorithms suffer from low efficiency when dealing with large-scale datasets. To relieve this problem, a distributed outlier detection algorithm based on clustering and indexing (DODCI) was presented. The algorithm partitioned the original dataset into clusters by employing a certain clustering method. Then the index of each cluster was built in parallel on each distributed node. Afterwards, detection of outliers was implemented on each node looply using two optimization strategies and two pruning rules. The experimental results on synthetic dataset and preprocessed KDD CUP datasets show that the proposed algorithm is almost up to an order-of-magnitude faster than the two existing algorithms (Orca and iDOoR) when the dataset is large enough. The theoretical and experimental analyses show that the proposed algorithm can effectively raise the speed of outlier detection in large-scale datasets.